In Spark, using emptyRDD() function on the SparkContext object creates an empty RDD with no partitions or elements. The below examples create an empty RDD.
Create empty RDD
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.master("local[3]")
.appName("RDD")
.getOrCreate()
val rdd = spark.sparkContext.emptyRDD
val rddString = spark.sparkContext.emptyRDD[String]
println(rdd)
println(rddString)
println("Num of Partitions: "+rdd.getNumPartitions)
val spark = SparkSession.builder()
.master("local[3]")
.appName("RDD")
.getOrCreate()
val rdd = spark.sparkContext.emptyRDD
val rddString = spark.sparkContext.emptyRDD[String]
println(rdd)
println(rddString)
println("Num of Partitions: "+rdd.getNumPartitions)
Create empty RDD with Partition
Using Spark sc.parallelize() we can create an empty RDD with partitions, writing partitioned RDD to a file results in the creation of multiple part files.
val rdd= spark.sparkContext.parallelize(Seq.empty[String])
println(rdd)
println("Num of Partitions: "+rdd.getNumPartitions)
println(rdd)
println("Num of Partitions: "+rdd.getNumPartitions)
Create empty pair RDD
Most we use RDD with pair hence, here is another example of creating an RDD with pair.
This example creates an empty RDD with String & Int pair.
type pairRDD = (String,Int)
var resultRDD = sparkContext.emptyRDD[pairRDD]
var resultRDD = sparkContext.emptyRDD[pairRDD]
No comments:
Post a Comment